17 research outputs found
ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection
Most advances in medical lesion detection network are limited to subtle
modification on the conventional detection network designed for natural images.
However, there exists a vast domain gap between medical images and natural
images where the medical image detection often suffers from several
domain-specific challenges, such as high lesion/background similarity, dominant
tiny lesions, and severe class imbalance. Is a hand-crafted detection network
tailored for natural image undoubtedly good enough over a discrepant medical
lesion domain? Is there more powerful operations, filters, and sub-networks
that better fit the medical lesion detection problem to be discovered? In this
paper, we introduce a novel ElixirNet that includes three components: 1)
TruncatedRPN balances positive and negative data for false positive reduction;
2) Auto-lesion Block is automatically customized for medical images to
incorporate relation-aware operations among region proposals, and leads to more
suitable and efficient classification and localization. 3) Relation transfer
module incorporates the semantic relationship and transfers the relevant
contextual information with an interpretable the graph thus alleviates the
problem of lack of annotations for all types of lesions. Experiments on
DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving
improvement of both sensitivity and precision over FPN with fewer parameters.Comment: 7 pages, 5 figure, AAAI202
CLIP: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
Contrastive Language-Image Pre-training, benefiting from large-scale
unlabeled text-image pairs, has demonstrated great performance in open-world
vision understanding tasks. However, due to the limited Text-3D data pairs,
adapting the success of 2D Vision-Language Models (VLM) to the 3D space remains
an open problem. Existing works that leverage VLM for 3D understanding
generally resort to constructing intermediate 2D representations for the 3D
data, but at the cost of losing 3D geometry information. To take a step toward
open-world 3D vision understanding, we propose Contrastive Language-Image-Point
Cloud Pretraining (CLIP) to directly learn the transferable 3D point cloud
representation in realistic scenarios with a novel proxy alignment mechanism.
Specifically, we exploit naturally-existed correspondences in 2D and 3D
scenarios, and build well-aligned and instance-based text-image-point proxies
from those complex scenarios. On top of that, we propose a cross-modal
contrastive objective to learn semantic and instance-level aligned point cloud
representation. Experimental results on both indoor and outdoor scenarios show
that our learned 3D representation has great transfer ability in downstream
tasks, including zero-shot and few-shot 3D recognition, which boosts the
state-of-the-art methods by large margins. Furthermore, we provide analyses of
the capability of different representations in real scenarios and present the
optional ensemble scheme.Comment: To appear at CVPR 202
Co-Creating for Locality and Sustainability: Design-Driven Community Regeneration Strategy in Shanghai’s Old Residential Context
Community regeneration has drawn much attention in both the urban development and sustainable design fields in the last decade. As a response to the regeneration challenges of Shanghai’s old and high-density communities, this article proposes two design-driven strategies: enabling residents to become innovation protagonists and facilitating collaborative entrepreneurial clusters based on the reorganization of community resources. Two ongoing collaborative projects between the Siping community and Tongji University—Open Your Space microregeneration (OYS) and the Neighborhood of Innovation, Creativity, and Entrepreneurship Towards 2035 (NICE 2035) living labs project—are adopted as main case studies. Research findings are put forward through a structured analysis of qualitative data. Firstly, we reviewed the situation and sustainable goals for Shanghai’s old residential communities, and how design-centric social innovation and collaboration can be effective interventions. Secondly, we analyzed resident empowerment approaches to decision-making, co-design, and co-management processes in OYS with participatory observation. Finally, through participants’ interviews and key events analysis in NICE 2035, we investigated how living labs reuse community distributed resources to develop lifestyle-based business prototypes. The inquiry of this article proposes a co-creation mechanism and action guides towards localized and sustainable community regeneration, which can provide a contextual paradigm for similar challenges
How to Save your Annotation Cost for Panoptic Segmentation?
How to properly reduce the annotation cost for panoptic segmentation? How to leverage and optimize the cost-quality trade-off for training data and model? These questions are key challenges towards a label-efficient and scalable panoptic segmentation system due to its expensive instance/semantic pixel-level annotation requirements. By closely examining different kinds of cheaper labels, we introduce a novel multi-objective framework to automatically determine the allocation of different annotations, so as to reach a better segmentation quality with a lower annotation cost. Specifically, we design a Cost-Quality Balanced Network (CQB-Net) to generate the panoptic segmentation map, which distills the crucial relations between various supervisions including panoptic labels, image-level classification labels, bounding boxes, and the semantic coherence information between the foreground and background. Instead of ad-hoc allocation during training, we formulate the optimization of cost-quality trade-off as a Multi-Objective Optimization Problem (MOOP). We model the marginal quality improvement of each annotation and approximate the Pareto-front to enable a label-efficient allocation ratio. Extensive experiments on COCO benchmark show the superiority of our method, e.g. achieving a segmentation quality of 43.4% compared to 43.0% of OCFusion while saving 2.4x annotation cost